基于CoMFA和CoMSIA对靶向野生型和突变体HCV NS5A蛋白的四环抑制剂的定量构效关系研究

QSAR studies on hepatitis C virus NS5A protein tetracyclic inhibitors in wild type and mutants by CoMFA and CoMSIA

Qin, Z.J.; Yan, A.X.*
SAR and QSAR in Environmental Research, 2020, 31 (4), 281–311

    基于196个丙型肝炎病毒(HCV)NS5A蛋白抑制剂建立了多个3D-QSAR模型。 收集了三组生物活性,即野生型GT1a、突变型GT1a Y93H和突变型GT1a L31V的EC90值构成三组数据集。应用OMEGA和ROCS程序分别对数据集中的抑制剂生成三维构象和进行分子叠合。每个数据集被随机分成训练集和测试集三次,以降低仅一次随机产生的偶然性。使用比较分子力场法(CoMFA)和比较分子相似指数法(CoMSIA) 建立QSAR模型。对于三组数据集GT1a、GT1a Y93H和GT1a L31V,最优模型CoMFA-INDX、CoMSIA-SEHA和CoMSIA-SEHA对测试集的r2值分别为0.682±0.033、0.779±0.036和0.782±0.022。对三个最优模型的等势图分析,总结出在四环核心基团、Z基团、脯氨酸基团和缬氨酸基团上对抑制剂生物活性有利和不利的取代基。我们推测突变体可以改变野生型活性口袋的静电表面。此外,我们使用ECFP分析来寻找重要的子结构,可以直观地理解QSAR模型的结果。

阅读文章原文

下载原始数据

Download Supporting Information

    Several 3D-QSAR models were built based on 196 hepatitis C virus (HCV) NS5A protein inhibitors. The bioactivity values EC90 for three types of inhibitors, the wild type (GT1a) and two mutants (GT1aY93H and GT1a L31V), were collected to build three datasets. The programs OMEGA and ROCS were used for generating conformations and aligning molecules of the dataset, respectively. Each dataset was randomly divided into a training set and a test set three times to reduce the contingency of only one random selection. QSAR models were computed by comparative molecular field analysis (CoMFA) and comparative molecular similarity indices analysis (CoMSIA). For the datasets GT1a, GT1a Y93H, and GT1a L31V, the best models CoMFA-INDX, CoMSIA-SEHA, and CoMSIA-SEHA showed an r2value of 0.682 ± 0.033, 0.779 ± 0.036, and 0.782 ± 0.022 on the test sets, respectively. From the contour maps of the three best models, we summarized the favourable and unfavourable substituents on the tetracyclic core, the Z group, the proline group, and the valine group of inhibitors. We guessed the mutants could change the electrostatic surfaces of the wild type active pocket. In addition, we used ECFP analyses to find important substructures and could intuitively understand the results from QSAR models.

Read more

QSAR Models performance:   3 datasets (GT1a,GT1a Y93H, GT1a L31V)

Model Name Dataset Descriptors Training set r2 Training set SEE Test set r2 Test set SEE
Model 1 GT1a (192 inhibitors) CoMFA-INDX 0.727 ± 0.006 0.334 ± 0.001 0.682 ± 0.033 0.418 ± 0.023
Model 2 GT1a Y93H (180 inhibitors) CoMSIA-SEHA 0.832 ± 0.007 0.506 ± 0.012 0.779 ± 0.036 0.608 ± 0.059
Model 3 GT1a L31V (159 inhibitors) CoMSIA-SEHA 0.871 ± 0.005 0.408 ± 0.010 0.782 ± 0.022 0.560 ± 0.047

主要项目成员

秦子健

博士研究生

zijianqin@foxmail.com